Fault TOLERANCE IN GRID COMPUTING: STATE OF THE ART AND OPEN ISSUES

نویسندگان

  • Ritu Garg
  • Awadhesh Kumar Singh
چکیده

Fault tolerance is an important property for large scale computational grid systems, where geographically distributed nodes co-operate to execute a task. In order to achieve high level of reliability and availability, the grid infrastructure should be a foolproof fault tolerant. Since the failure of resources affects job execution fatally, fault tolerance service is essential to satisfy QOS requirement in grid computing. Commonly utilized techniques for providing fault tolerance are job checkpointing and replication. Both techniques mitigate the amount of work lost due to changing system availability but can introduce significant runtime overhead. The latter largely depends on the length of checkpointing interval and the chosen number of replicas, respectively. In case of complex scientific workflows where tasks can execute in well defined order reliability is another biggest challenge because of the unreliable nature of the grid resources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fault-tolerant behavior in state-of-the-art Grid Workflow Management Systems

While the workflow paradigm, emerged from the field of business processes, has been proven to be the most successful paradigm for creating scientific applications for execution also on Grid infrastructures, most of the current Grid workflow management systems still cannot deliver the quality, robustness and reliability that are needed for widespread acceptance as tools used on a day-to-day basi...

متن کامل

Techniques for Designing Survivable Optical Grid Networks

Grid computing involves high performance computing with resource sharing to support data-intensive applications, and requires high speed communications. Wavelength division multiplexing (WDM) optical networks become a natural choice for interconnecting the distributed computational and/or storage resources due to their high throughput, high reliability and low cost. This has led to increased re...

متن کامل

Achieving QoS in Highly Unreliable Grid Environments

Grids can form the basis for pervasive computing due to their ability of being open, scalable, and flexible to various changes (from topology changes to unpredicted failures of nodes). However, such environments are prone to failures due to their nature and need a certain level of reliability in order to provide viable and commercially exploitable solutions. This is causing nowadays a significa...

متن کامل

Context Prediction based on Context Histories: Expected Benefits, Issues and Current State-of-the-Art

This paper presents the topic of context prediction as one possibility to exploit context histories. It lists some expected benefits of context prediction for certain application areas and discusses the associated issues in terms of accuracy, fault tolerance, unobtrusive operation, user acceptance, problem complexity and privacy. After identifying the challenges in context prediction, a first a...

متن کامل

A survey on virtual machine migration and server consolidation frameworks for cloud data centers

Modern Cloud Data Centers exploit virtualization for efficient resource management to reduce cloud computational cost and energy budget. Virtualization empowered by virtual machine (VM) migration meets the ever increasing demands of dynamic workload by relocating VMs within Cloud Data Centers. VM migration helps successfully achieve various resource management objectives such as load balancing,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011